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Semantic networks qualify the meaning of an edge relating any two vertices. Determining which 
vertices are most "central" in a semantic network is difficult because one relationship type may 
be deemed subjectively more important than another. For this reason, research into semantic 
network metrics has focused primarily on context-based rankings (i.e. user prescribed contexts). 
Moreover, many of the current semantic network metrics rank semantic associations (i.e. directed 
paths between two vertices) and not the vertices themselves. This article presents a framework for 
calculating semantically meaningful primary eigenvector-based metrics such as eigenvector centrality 
and PageRank in semantic networks using a modified version of the random walker model of Markov 
chain analysis. Random walkers, in the context of this article, are constrained by a grammar, where 
the grammar is a user defined data structure that determines the meaning of the final vertex ranking. 
The ideas in this article are presented within the context of the Resource Description Framework 
(RDF) of the Semantic Web initiative. 



I. INTRODUCTION 

There exists a large collection of centrality metrics that 
have been used extensively to rank vertices in single- 
relational (or unlabeled) networks. Any metric for de- 
termining the centrality of a vertex in a single-relational 
network can be generally defined by the function / : 
G -> where a single-relational network is denoted 

G 1 = (V = {i, . . . , j}, E C V x V) and the range of / 
is the rank vector representing the centrality value as- 
signed to each vertex in V |39"]. The work in [TO j I2T ] [47] 
provide reviews of the many popular centrality measures 
that are currently used today to analyze single-relational 
networks. 

Of particular importance to this article are those met- 
rics that use the primary eigenvector of the network to 
rank the vertices in V (namely eigenvector centrality [9 
and PageRank [33). If A £ R\ v \*\ v \ is the adjacency 
matrix representation of G , then the primary eigenvec- 
tor of A is 7r when Air = Air, where A is the greatest 
eigenvalue of all eigenvectors of A and it € K' y ' |3B] . 
The primary eigenvector has been applied extensively 
to ranking vertices in all types of networks such as so- 
cial networks [3], scholarly networks of articles [2] and 
journals [7j, and technological networks such as the web 
citation network |33j . In single-relational networks, de- 
termining the primary eigenvector of the network can be 
computed using the power method which simulates the 
behavior of a collection of random walkers traversing the 
network |10j . Those vertices that have a higher prob- 
ability of being traversed by a random walker are the 
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most "central" or "important" vertices. For aperiodic, 
strongly connected networks, it is the eigenvector cen- 
trality ranking [pj. For networks that are not strongly 
connected or are periodic, the network's topology can be 
altered such that a "teleportation" network can be over- 
laid with G 1 to produce an irreducible and aperiodic net- 
work for which the power method will yield a real valued 
it. This is the method that was introduced by Brin and 
Page and is popularly known as the random web-surfer 
model of the PageRank algorithm [33]. The PageRank 
algorithm is one of the primary reasons for the (subjec- 
tively) successful rankings of web pages from the Google 
search engine 25j. 

In a single-relational social network, for example, the 
network data structure can only represent a single type 
of relationship such as friendship. However, in a semantic 
network (or multi-relational network) , the vertices can be 
connected to each other by a heterogeneous set of rela- 
tionships such as friendship, kinship, collaboration, com- 
munication, etc. For a semantic network instance, there 
usually exists an ontology (or schema) which specifies 
how vertex types are related to one another. For exam- 
ple, an ontology may say that a vertex of type human can 
have another vertex of type human as a friend, but a hu- 
man cannot have a vertex of type animal as a friend. An 
ontology is nearly analogous to the object-specifications 
of object-oriented programming minus the method decla- 
rations [37] and loosely related to the schema definitions 
of relational databases. 

The Resource Description Framework (RDF) is a pop- 
ular data model for explicitly representing semantic net- 
works for the distribution and use amongst comput- 
ers [23] [28j [31]. The Resource Description Framework 
Schema (RDFS) is a popular ontology language for RDF 
[TTj . An RDF network can be represented as a triple list 
G n C (V X fi X V) , where fHs a set of edge labels denoting 
the semantic (or meaning) of the relationship between 
the vertices in V and any ordered triple (i,Lu,j) G G n 
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states that vertex i is related to vertex j by the seman- 
tic iv. The use of labeled edges complicates the meaning 
of the rank vector returned by single-relational central- 
ity measures because some vertices may be deemed more 
central than others with respect to one edge label, but 
not with respect to another. For example, the relation- 
ship isFriendOf may be considered more relevant than 
livesInSameCityAs. Therefore, due to the number of 
ways by which two adjacent vertices can be related and 
the focus on the semantics of such relations, the aim of 
recent semantic network metrics have been on ranking 
semantic associations [3J [351 HI], not the vertices 
themselves. A semantic association between vertices i 
and j is defined by the ordered multi-set path q, where 
Q = lhu a , . . -,iv b ,j), i,j £ V, and u a , iv b £ il [3J. If Q lyj 
is the set of all possible semantic associations between 
vertices i and j in G™, then a path metric function is 
generally defined as / : Qv,v — > K' < ^ v ' v ', where the range 
of / denotes the ranking of each path in Qi,j. 

This article focuses on vertex ranking, not path rank- 
ing. Moreover, this article is primarily interested in 
eigenvector-based metrics such as eigenvector centrality 
[9] and PageRank [33]. While eigenvector-based metrics 
on semantic networks have been proposed to rank ver- 
tices, the algorithms rely on prescribed semantic network 
ontologies and therefore, have not been generalized to 
handle any semantic network instance [30, 38, 48 . This 
article presents a method for applying eigenvector-based 
centrality metrics to semantic networks such that the se- 
mantic network's ontology is respected. The proposed 
method extends the random walker model of Markov 
chain analysis |22j to support its application to seman- 
tic network vertex ranking without altering the original 
data set or isolating subsets of the data set for analy- 
sis. This method is called the grammar-based random 
walker method. While the random walker's of Markov 
chain analysis are memory less, grammar-based random 
walkers of semantic networks utilize a user-defined gram- 
mar (or program) that instructs the grammar-based ran- 
dom walker to take particular ontological paths through 
the semantic network instance. Moreover, a grammar- 
based random walker maintains a memory of its path 
in the network and in the grammar in order for it to 
execute simple logic along its path. This simple logic 
allows the grammar-based random walker to generate se- 
mantically complex eigenvector rankings. For example, 
given a scholarly semantic network and the grammar- 
based method, it is possible to calculate tt over all author 
vertices such that the authors indexed by tt are located 
at some institution and they wrote an article that cites 
another article of a different author of the same institu- 
tion. 

The next section provides an overview of the class of 
eigenvector-based metrics for single-relational networks 
that use the random walker model and then proposes 
a method for meaningfully applying such metrics to se- 
mantic networks. The result is a vertex valuing function 
generally defined as/iGx^^Rl-^l where \& is a user 



defined grammar and tt £ Rl- V L 



II. RANDOM WALKERS IN 
SINGLE-RELATIONAL NETWORKS 

The random walker model comes from the field of 
Markov chain analysis. Markov chains are used to model 
the dynamics of a stochastic system by explicitly rep- 
resenting the states of the system and the probability 
of transition between those states [15j [32] . A Markov 
chain can be represented by a directed weighted net- 
work G 1 = (V, E, w) where the set of vertices in V 
are system states, E C V x V are the set of directed 
edges representing the transitions between states, and 
w : E — > [0, 1] is the function that maps each edge to 
a real weight value that represents the state transition 
probability [50 j . The outgoing edge weights of any state 
in the Markov chain form a probability distribution such 
that £ eer+(l) w(i) = 1 : l r+ WI > L where r+ W £ E 
is the set of outgoing edges of vertex i. The future state 
of the system at time n + 1 is based solely on the current 
state of the system at time n and its respective outgoing 
edges. 

Given that a Markov chain can be represented by a 
weighted directed network, one can envision a random 
walker moving from vertex to vertex (i.e. state to state). 
A random walker moves through the Markov chain by 
choosing a new vertex according to the transition prob- 
abilities outgoing from its current vertex. This process 
continues indefinitely where the long run behavior, or 
stationary distribution denoted tt, of the random walker 
makes explicit the probability of the random walker be- 
ing located at any one vertex at some random time in 
the future. However, only aperiodic, irreducible, and re- 
current Markov chains can be used to generate a tt that 
is the stationary distribution of the chain [TO]. If the 
Markov chain is aperiodic then the random walker does 
not return to some previous vertex in a periodic manner. 
A Markov chain is considered recurrent and irreducible if 
there exists a path from any vertex to any other vertex. 
In the language of graph theory, the weighted directed 
network representing the Markov chain must be strongly 
connected. If A £ R\ V M V \ is the weighted adjacency 
matrix representation of G 1 and there exists a vertex 
vector tt £ W v * where X^igy 7r < = 1 and Att = \tt, where 
A is the greatest eigenvalue of all eigenvectors of A, then 
7r is the stationary distribution of G 1 as well as the pri- 
mary eigenvector of A |34j . The vector tt represents the 
eigenvector centrality values for all vertices in V 9 . 

In the real world, periodicity is highly unlikely in most 
natural networks [10J . However, a strongly connected 
network is not always guaranteed. If the network is not 
strongly connected, then the problem of rank sinks and 
subset cycles is introduced and tt is not a real valued 
vector. Therefore, many networks require some manip- 
ulation to ensure strong connectivity. For example, the 
web citation network, represented as G 1 = (V, E), is not 
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strongly connected jTHj and therefore, in order to calcu- 
late 7r for the web citation network, it is necessary to 
transform G 1 into a strongly connected network. One 
such method was introduced in [T21 13"5] where a proba- 
bilistic web citation network is overlaid with a fully con- 
nected web citation network. In matrix form, the prob- 
abilistic adjacency matrix of the web citation network, 
A £ Rl y l x l y l, is created, where 



if |r+(z)| = o. 



In A, all rank sinks (i.e. vertices with no out degree, 
absorbing vertices) connect to every other vertex in V 
with equal probability. Next, the matrix B is created 



such that B £ ]Rl y l x l y l and B, 



|yy for all i and j 



in V. B denotes a fully connected network (i.e. a com- 
plete network) where every vertex is connected to every 
other vertex with equal probability. The composite ad- 
jacency matrix C = SA + (1 — <5)B, where S £ (0, 1] is a 
parameter weighting the contribution of each adjacency 
matrix, guarantees that there is some finite probability 
that each vertex in V is reachable by every other vertex in 
V. Therefore, the network denoted by C is strongly con- 
nected and there exists a unique stationary distribution 
7r such that Ctt = Xt:. This method of inducing strong 
connectivity is called PageRank and has been used ex- 
tensively to rank vertices in a unlabeled, single-relational 
networks [25] , 

The primary contribution of this article is that it ports 
the eigenvector-based algorithms of single-relational net- 
works over to the semantic network domain. This article 
presents a method for calculating a semantically mean- 
ingful stationary distribution within some subset of a se- 
mantic network (called grammar-based eigenvector cen- 
trality) as well as how to implicitly induce strong con- 
nectivity irrespective of the network's topology (called 
grammar-based PageRank). This general method is 
called the grammar-based random walker model because 
a random walker does not blindly move from vertex to 
vertex, but instead is constrained by a grammar that en- 
sures that the stationary distribution is calculated in a 
"grammatically correct" subset of G n . Before discussing 
the grammar-based random walker method, the next sec- 
tion provides a brief review of semantic networks, ontolo- 
gies, and current standards for their representation. 



III. SEMANTIC NETWORKS 

A semantic network is also known as a multi-relational 
network or directed labeled network. In a semantic net- 
work, there exists a heterogeneous set of vertex types and 
a heterogeneous set of edge types such that any two ver- 
tices in the network can be connected by zero or more 
edges. In order to make a distinction between two edges 
connecting the same vertices, a label denotes the mean- 
ing, or semantic, of the relationship. A semantic network 



can be represented by the triple list G n C (V x f2 x V). 
A vertex to vertex relationship is called a triple because 
there exists the relationship (i,aj,j) where i £ V is called 
the subject, u> £ £1 is called the predicate, and j £ V is 
called the object. 

Perhaps the most popular standard for representing se- 
mantic networks is the Resource Description Framework 
(RDF) of the Semantic Web initiative [23J HE]- There 
currently exists many applications to support the cre- 
ation, query, and manipulation of RDF-based seman- 
tic networks. High-end, modern day triple-stores (RDF 
databases) can reasonably support on the order of 10 9 
triples pQ. For this reason, and due to the fact that 
RDF is becoming a common data model for various 
disciplines including digital libraries |4j, bioinformatics 
[ITj . and computer science [39], all of the constructs of 
the grammar-based random walker model will be pre- 
sented according RDF and its ontology modeling lan- 
guage RDFS. 

RDF identifies vertices in a semantic network by Uni- 
form Resource Identifiers (URI) [5], literals, or blank 
nodes (also called anonymous nodes) and edge labels are 
represented by URIs. An example RDF triple where all 
components are URIs is 

(lanl :marko, lanl :hasFriend, lanl : johan). 

In this triple, lanl is a namespace prefix that represents 
http://www.lanl.gov. This prefix convention is used 
throughout the article to ensure brevity of text and dia- 
gram clarity. Figure [T] is a graphic representation of the 
previous triple. 



lankmarko 



■ lanl:hasFriend 



lanhjohan 



FIG. 1: A example triple in RDF. 
Another example of a triple where the object is a literal 

is 

(lanl :marko, lanl :hasFirstName, "Marko" A A xsd: string) 

In this triple, the literal "Marko" AA xsd: string is an 
XML schema datatype string (xsd) [6]. 

While a semantic network instance is represented in 
pure RDF, a semantic network ontology is represented in 
RDFS (a language represented in RDF). 



A. Ontologies 

Due the heterogeneous nature of the vertices and edges 
in a semantic network, an ontology is usually defined as 
way of specifying the range of possible interactions be- 
tween the vertices in the network. Ontologies articulate 
the relation between abstract concepts and make no ex- 
plicit reference to the instances of those classes [35]. For 
example, the ontology for the web citation network can 
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be defined by a single class representing the abstract con- 
cept of a web page and the single semantic relationship 
representing a web link or citation (i.e. href). This sim- 
ple ontology states that the network representing the se- 
mantic model of the web is constrained to only instances 
of one class (a web page) and one relationship (a web 
link). 

Given the previous single triple represented in Figure 
[l] the semantic network ontology could be represented 
as diagramed in Figure [2j where the lanl : hasFriend 
property must have a domain of lanl : Human and a range 
of lanl: Human, where lanl:marko and lanl: johan are 
both lanl: Humans. 



lankHuman 






rdf:t 


ype 


lanhmarko 



rdfs:range 



lanl:hasFriend 



rdf:type 



_ontology_ 
instance 



- lanl:hasFriend ► lanhjohan 



FIG. 2: A example of the relationship between an ontology 
and its instance. 

Note that ontological diagrams can be abbreviated by 
assuming that the tail of an edge is the rdf s : domain and 
the head of the edge is the rdf s : range. This abbreviated 
form is diagrammed in Figure [3] 
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■ lanl:hasFriend ■ 



lanhjohan 



FIG. 3: An abbreviation of the diagramed in Figure [2] 

In general, the relationship between an ontology and its 
corresponding semantic network instantiation is depicted 
in Figure[4]where the rdf : type property denotes that the 
vertices in V are an instance of some abstract class in the 
ontology. 



Semantic 
Network 
Instance 



rdhtype 



Ontology 



FIG. 4: The relationship between a semantic network instance 
and its ontology. 

RDFS does not provide a large enough vocabulary to 
describe many of the types of relations needed for model- 
ing class interactions |24j . For this reason, other model- 
ing languages, based on RDFS, have been developed such 



as the Web Ontology Language (OWL) [Ml US]- OWL 
allows a modeler to represent restrictions on properties 
(e.g. cardinality) and provides a broader range of prop- 
erty types (e.g. inverse relationships, functional relation- 
ships). Even though RDFS is limited in its expressiveness 
it will be used as the modeling language for describing 
the grammar-based random walker ontology. Note that 
it is trivial to map the presented concepts over to other 
modeling languages such as OWL. For a more in-depth 
review of ontology modeling languages, their history, and 
their application, please refer to [Mj and [20] , 

The next section brings together the concepts of ran- 
dom walkers, semantic networks, and ontologies in order 
to formalize this article's proposed grammar-based ran- 
dom walker model. 



IV. GRAMMAR-BASED RANDOM WALKERS 

A grammar-based random walker moves through a se- 
mantic network in a manner that respects the labels of 
the edges connecting the network's vertices. The purpose 
of the grammar-based random walker is to identify the 
stationary distribution of some subset of the full seman- 
tic network (i.e. the primary eigenvector of a sub-network 
of the network). Unlike the random walkers of Markov 
chain analysis, a grammar-based random walker does not 
take any outgoing edge from its current vertex, but in- 
stead, depending on the user defined grammar, traverses 
particular edges types to particular vertex types. 

Any designed grammar uses the constructs and algo- 
rithms defined by the grammar ontology (prefixed as 
rwr). The grammar ontology defines rule classes, at- 
tribute classes, data structures, and properties that are 
intended to be combined with instances and classes of G n 
to create a G n specific grammar denoted 'J. The rules 
of the grammar ultimately determine which vertices in 
V are indexed by the returned rank vector tt. The rank 
vector 7T is created by a set of grammar-based random 
walkers P traversing through G n and obeying ^. Figure 
[5] diagrams the relationship between P, G n , and their 
respective ontologies. Note that ^, ^'s ontology, G", 
and G n 's ontology are all semantic networks and thus, 
can be represented by the same semantic network data 
structure. However, in order to make the separation be- 
tween the components clear, each data structure will be 
discussed as a separate semantic network. 
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FIG. 5: The grammar-based random walker architecture. 
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The meaning of the vertex rank vector tt of the 
grammar-based model, both semantically and theoreti- 
cally, depends primarily on the grammar used. Some 
will generate a tt that is the stationary probability distri- 
bution of some subset of G n , while others will be more 
representative of a discrete form of the spreading acti- 
vation models, where calculating the long run behavior 
of the random walker is undesirable jTSJ [T71 [TSJ U5] . In 
practice, determining whether it is a stationary distribu- 
tion of the analyzed subset of V is a matter of determin- 
ing whether the subset of G n that is traversed by P is 
strongly connected and the normalized tt has converged 
to a stable set of values. Any grammar-based random 
walker implementation is a function generally defined as 
/:Gx<t-> 

It is noted that there exists two related ontologies for 
modeling the distribution of discrete entities in a seman- 
tic network. These ontologies were inspirational to the 
ideas presented in this article. The marker passing Petri 
net ontology of [19] and the particle swarm ontology of 
[55] , However, both ontologies were designed for a dif- 
ferent application space. The first is for Petri net algo- 
rithms while the latter was defined specifically for col- 
lective decision making systems. Finally, the grammar- 
based model presented in jiD] for calculating geodesies in 
a semantic network combined with the grammar-based 
model presented in this article form a unified framework 
for porting many of the popular single-relational network 
analysis algorithms over to the semantic network domain 
(more specifically, the RDF and Semantic Web domain) . 



A. The Grammar-Based Random Walker Ontology 

The complete grammar ontology is graphically rep- 
resented in Figure [6j where squares are rdfs: Classes 
and edge labels are rdf : Property types. The tail of 
each edge is the rdf s: domain of the rdf : Property and 
the head is the rdfs: range. For the purpose of dia- 
gram clarity, the dashed edges denote a relationship of 
rdf s : subClassOf . Finally, note that the two dashed 
squares should be instances or classes that are in G n or 
its ontology, respectively. 

The grammar ontology follows a convention similar 
to most object oriented programming languages |43j 
in that a rwr: Context (i.e. class) has a set of at- 
tributes (i.e. fields) and rules (i.e. methods). The gen- 
eral idea is that any grammar instance f is a col- 
lection of rwr: Context objects connected to one an- 
other by rwr: Traverse rules, rwr: Contexts and their 
rwr : Traverse rules are an abstract model of what triples 
a grammar-based random walker can traverse in G n . 
The rwr: Is and rwr: Not attributes further constrain 
the types of vertices that can be traversed by the ran- 
dom walker and are used for path "bookkeeping" and 
path logic. The rwr : IncrCount and rwr : SubmitCounts 
rules determine which vertices in V should be indexed 
by tt. Finally, the rwr : Reresolve rule is the means by 
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FIG. 6: The complete grammar-based random walker ontol- 
ogy- 



which the random walker is able to "teleport" to other 
regions of G n . The rwr : Reresolve rule is used to model 
the PageRank algorithm and therefore, is a mechanism 
for guaranteeing that the subset of G n that is traversed 
is strongly connected and tt is a stationary distribution. 



B. High-Level Overview of the Grammar-Based 
Model 



This section will provide a high-level overview of the 
components of the grammar diagrammed in Figure [6] 
is a user defined data structure that is created specifi- 
cally for G n and G n 's respective ontology. Any iff must 
obey the constraints defined by the grammar ontology di- 
agrammed in Figure [6j A single grammar-based random 
walker (denoted p € P) "walks" both G" and W in or- 
der to dynamically generate a vertex rank vector denoted 
tt. If the p-traversed subset of G n is strongly connected, 
then only a single random walker is needed to compute 

tt m- 

When random walker p € P is at some rwr: Context 
in iff, the rwr: Context is "resolved" to a particular ver- 
tex in V. This is the relationship between iff and G n . 
For example, if p is at some rwr: Context in iff that 
is rwr : f orResource lanl: Human, then p must also be 
at some vertex in V that is of rdf : type lanl: Human. 
Thus, \& is an abstract representation of the legal ver- 
tices that p can traverse in V. When p is at a 
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rwr: Context, p will execute the rwr : Context's collec- 
tion of rwr: Rules, while at the same time respect- 
ing rwr: Context rwr : Attributes. The collection of 
rwr: Rules is an ordered rdf:Seq This means 

that p must execute the rules in their specified se- 
quence. This is represented as the set of properties 
rdf :_1, rdf :_2, rdf : _3, etc. (i.e. rdf s : subPropertyOf 
rdf s : ContainerMembershipProperty). 

Any grammar-based random walker p has three local 
variables: 

• a reference to its path history in G n (denoted g p ) 

• a reference to its path history in 'J (denoted tp p ) 

• a local vertex vector (denoted tt p s N'- y ') 
and a reference to a single global variable: 

• a global vertex vector (denoted tt G N'-^l) 

The path history g p is an ordered multi-set 
of vertices, edge labels, and edge directionalities. 
If the random walker p traversed the path dia- 
grammed in Figure [I] from left to right, then g p = 
{lanl :marko, lanl :hasFriend, +, lanl : johan}. Note 
that g ] Q = lanl:marko, g p , = lanl : has Friend, g p n = +, 
and g\ — lanl: johan, where n' denotes the edge la- 
bel used to get to the vertex at time n and n" de- 
notes the direction that p traversed over that edge. In 
the grammar-based random walker model, a random 
walker can, if stated in oppose an edge's direction- 
ality. For example, if p had traversed the edge dia- 
grammed in Figure [T] from right to left, then g p — 
{lanl : johan, lanl :hasFriend, — , lanl :marko}. A sim- 
ilar convention holds for p's ^-history tp p . However, in 
ip p the vertices are rwr : Contexts, the edge labels are 
the rdf : Property of the rwr: Edge chosen, and the di- 
rectionalities are determined by whether an rwr : DutEdge 
or rwr : InEdge was traversed. 

The "walking" aspect of p for both W and G n is 
governed by the rwr: Traverse rule. When p exe- 
cutes a rwr: Traverse rule in it selects a par- 
ticular rwr: Edge to traverse. For rwr : OutEdges, a 
triple in G n is selected with the subject being its 
current location g p , and predicate and objects are 
instances of the respective resource specified by the 
rwr:OutEdge (rwr :hasPredicate and rwr :hasObject). 
For rwr:InEdges, a triple in G n is selected where g p is 
the object of the triple and the subject and predicate are 
instances of the resource specified by the rwr : InEdge 
(rwr :hasPredicate and rwr :hasSubject). The 
rwr: Context chosen is V'n+i an d the rdf s : Resource of 
the triple (ip P l+1 ,T'WT:±orResource,7x) G ^ determines 
Sn+i! where Ix is any class in G n, s ontology or instance 
in G n, s vertex set V. The newly chosen g^+i is called 
the resolution of i^+i- 

The rwr : IncrCount and rwr : SubmitCounts rules ef- 
fect the random walker's local vertex vector tt p and the 
global vertex vector tt, respectively. The distinction 



between ir p and tt is that ir p is a temporary counter 
that is not submitted to the global counter tt until the 
rwr : SubmitCounts rule has been executed. The walker p 
does not submit its vertex counts until it has determined 
that it is in a ^-correct subset of G n . 

The process of moving p through a semantic network 
and allowing it to increment a counter for specific vertices 
continues until the ratio between the values of the global 
tt converge. Note that tt does not provide a probabil- 
ity distribution, X)ie 7 r 7r « ^ 1- Instead, tt represents the 
number of times an indexed vertex of tt has been counted 
by a grammar-based random walker. Therefore, to de- 
termine the probability of being at any one vertex that is 
indexed by tt, tt can be normalized to generate a new vec- 
tor denoted tt' e R 1 -^, where tt' = ^* — . If tt' is the 

normalization of tt then, when tt' no longer changes with 
successive executions of the rwr : SubmitCounts rule, the 
process is complete. More formally, if e G M is an ar- 
gument specifying the smallest change accepted for con- 
vergence consideration, then the grammar-based random 
walker algorithm is complete when ||tt^ — ^' m \\ 2 < e - 
where n and m are the time steps of consecutive calls 
to rwr : SubmitCounts. However, like Markov chains, 
this convergence will only occur if the subset of G n 
that is traversed is strongly connected and aperiodic. If 
the traversed subset of G n is not strongly connected or 
is periodic, then the rwr :Reresolve rule can be used 
to simulate grammar-based random walker "teleporta- 
tion". With the inclusion of the rwr :Reresolve rule, a 
grammar-based PageRank can be executed on G n . 

The next section will formalized each of the rwr : Rules 
and rwr : Attributes of the grammar ontology. 



V. THE RULES AND ATTRIBUTES OF THE 
GRAMMAR ONTOLOGY 

The following rwr: Rules and rwr : Attributes are 
presented in a set theoretic form that borrows 
much of its structure from semantic query lan- 
guages such as SPARQL j3S]. The query triple 
(?x, rdf : type, lanl : Author) G G n will bind Ix to any 
lanl: Author in the semantic network G n . The ?x no- 
tation represents that Ix is a variable that is bound to 
any vertex (i.e. URI) that matches the query pattern. 
The same query can return many resources that bind 
to Ix. In such cases, the results are returned as a set. 
Thus X = {Ix | (Ix, rdf : type, lanl: Author) G G n } de- 
notes the set of all vertices in V that are of rdf : type 
lanl : Author. 

The following subsections present each of the 
rwr: Rules and rwr : Attributes that a grammar-based 
random walker must execute and respect during its jour- 
ney through both iff and G n . 
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A. Entering * and G TI 



B. The rwr:Not Attribute 



Every random walker "walks" both 'J and G n in paral- 
lel. However, before a walker can walk either data struc- 
ture, it must enter both "J and G". The entry points of 
^ are rwr : EntryContexts and are represented by the set 
s($), where 

s($) ={lx | (?x,rdf : type, rwr: EntryContext) £ 

The starting location <f> £ s($) of p is chosen with prob- 
ability ij/^ji ■ Once some </> is chosen, i/jq = <fi (time n 
starts at 0). An entry location into V can be determined 
by randomly selecting some vertex i £ s(V \ 4>), where 
s(V | 4>) is the set of all i £ V given that i is a proper 
resolution of the rwr : EntryContext 4>. Thus, 

s(V | (j)) ={li | (0, rwr : f orResource, Iz) £ $ 

A ((??,, rdf :type,?z> £ * V ?i =?«)}, 

where type inheritance is strictly followed. For instance, 
if i is an rdf : type of z then i is an instance of z or an 
instance of u where u is a rdf s : subClassOf z. This is 
subsumption in RDFS reasoning and will be used repeat- 
edly throughout the remainder of this article. 

Given the set s(V | </>), the probability of p choosing 



some / 



e s(v | 0) 



IS 



The chosen vertex i be 



comes the starting location of p in G n and thus, <?q = i. 

Note that = 0, 1%, = 0, gg„ = and ij%„ = since a 
random walker enters both ^ and G™ at a vertex without 
using an intervening edge label or directionality. Figure 
[7] depicts how rwr : EntryContexts in VP are related to 
vertices in G n . 



rwnEntryContext 



7 V 

rdf:type rdfitype 



rwr:f orResource 



rwr:forResource 
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rdf:type \ 



G n ontology 



rdf:type 



rdf:1ype 
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s(V | 0) S (V | $) 



G" 



FIG. 7: The relationship between rwr : EntryContexts in ty, 
G n , and G n 's ontology. 



Before presenting the rwr: Traverse rule, it is im- 
portant to discuss the two attributes that constrain the 
rwr: Traverse rule: namely, rwr: Not and rwr: Is. This 
subsection will discuss the rwr : Not attribute. The next 
section will discuss the rwr: Is attribute. The rwr: Not 
atttribute ensures that the random walker p does not tra- 
verse an edge to a particular, previously seen vertex in 
g p . Any rwr : Not attribute is the subject of a triple with 
a predicate rwr : steps and literal m £ N. The literal m 
denotes which vertex from m-steps ago p must avoid. In 
other words, p must not have a g v n+1 that equals g v n _ m - 
Thus, the rwr: Context V'n+i cannot resolve to 3^_ m - If 

M ={?m | (V4+i,rwr :hasAttributes, ?a;) £ * 
A rwr :hasAttribute, ?y) £ * 
A (?y, rdf : type, rwr : Not) £ * 
A (?y, rwr : steps, ?m) £ ^}, 



then 



*(P) 



Tt + l 



u 

mEAI 



V 

9n—m 7 



where X(p) n+1 C V and A(p) rl+ i n .gJJ +1 = 0. The set 
X(p) n+ i is the set of vertices in V that must not 

equal. 

The rwr : Not attribute is useful when p must not return 
to a vertex in V that has been previously visited. Imagine 
that p is determining whether or not a particular article 
has at least two authors (or must traverse an implicit 
coauthorship network). Such an example is depicted in 
Figure [8] where the numbered circles are the location of p 
at particular time steps and author vertices are only con- 
nected to their authored articles. If , at n — 1, p is located 
at lanl :marko then p will traverse the lanl : wrote pred- 
icate to the lanl :DDD article. If p is checking for another 
author that is not lanl:marko then p can only take the 
lanl : wrote predicate to lanl : dsteinbock. If lanl : DDD 
only had one author, then p would be stuck (i.e. halt) 
at lanl: DDD since no legal lanl: wrote predicate could 
be traversed. At which point, it is apparent that the 
article has only one author. Moreover, by traversing 
to lanl : dstreinbock and not back to lanl:marko at 
ri = 3, a coauthorship network is implicitly traversed. 



© 
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FIG. 8: An example situation for the rwr: Not attribute 
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The rwr:Is Attribute 



D. The rwr: Traversal Rule 



Unlike the rwr : Not attribute, the rwr: Is atttribute is 
used to ensure that the random walker p does, in fact, 
traverse an edge to a previously visited vertex in V. Any 
rwr : Is attribute is the subject of a triple with a predicate 
rwr: steps and literal m G N. The literal m denotes 
which vertex from m-steps ago p must traverse to. If this 
set of vertices returned by the rwr : Is attribute is greater 
than 1, then p must traverses to one of the vertices from 
the set. Thus, the random walker p must have vertex 
9n+i equal some g v n _ m - In other words, the rwr : Context 
4>n+i mus t resolve to some g\\_ m - If 

M ={?m | (V^ +1 ,rwr:hasAttributes,?2;) S * 
A (?x, rwr :hasAttribute, ?y) G "J 
A (?y,rdf : type, rwr : Is) G * 
A (?y, rwr : steps, ?m) G 

then 



Q(p)r 



u 



a p 



where 0{p) n+ i C V and g p +1 G 0(p) n+ i. Again, unless 
0(p) n +i = 0: one of the vertices in 0{p) n+ \ must be p's 
location in G™ at n + 1 . 

The rwr: Is attribute is useful when p must search 
particular properties of a vertex and later return to 
the original vertex. For instance, imagine the triple 
(lanl :LANL, rdf : type, lanl : Laboratory) G G™ as de- 
picted in Figure [9] where the numbered circles represent 
the p's location at particular time steps n. Assume that p 
is at the lanl : LANL vertex at n = 1 and p must check to 
determine if lanl: LANL is, in fact, a lanl : Laboratory. 
In order to do so, p must traverse the rdf : type predi- 
cate to arrive at lanl : Laboratory at n — 2. At n = 3, 
p should return to the original lanl : LANL vertex. With- 
out the rwr : Is attribute, p has the potential for choosing 
some other lanl : Laboratory, such as lanl:PNNL. Once 
back at lanl : LANL, it is apparent that lanl : LANL is a 
lanl : Laboratory and p can move to some other vertex 
at n = 4. 



© 

CD 



© 
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lanl:Laboratory 
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©© © 
FIG. 9: An example situation for the rwr: Is attribute 



The rwr : Travere rule allows the random walker p to 
traverse to a new rwr : Context in \l/ and a new ver- 
tex in V. If there exists some rwr : Context <f> with the 
rwr : Traverse rule t, then when g v l = a and ip? = (j>, the 
probability of p traversing some outgoing triple from a 
or some incoming triple to a is jp^^y, where if 

^out ={?!/ I (t,rdfs:hasEdge,?y) G 

A (?y, rdf : type, rwr: OutEdge) G 

={?y I (t,rdfs:hasEdge,?y) G # 

A (?y, rdf : type, rwr : InEdge) G \&}, 

r+(a,p)= |J {(a, | (a, ?w,?6) G G" 
yeY aut 

A (y,rwr:hasPredicate,?w) G W 

A ((?w,rdfs:subPropertyOf ,?w) G G r ' 

V ?LU =?W) 

A (y, rwr :hasObject, Ix) G $ 
A rdf : f orResource, ?z) G W 
A {{lb, rdf : type, ?z) G G" V ?6 =?,?) 
A {0{p) n+1 = V ?6 G 0(p) n +i) 
A ?& £ A»„ +1 }, 

r-(a,p) - |J {(?6,?w,o) | (?6,?w,o) G G" 

A (j/, rwr :hasPredicate, ?u>) G $ 

A {{?uj, rdfs:subPropertyOf,?w) G G n 

V ?w =tw) 
A (y,rwr:hasSubject,?a;) G \E' 
A (?£, rdf : f orResource, Iz) G \& 
A ((?&, rdf : type, ?z) G G" V ?&=?*) 
A (0(p)„ +1 = V ?6 G 0(p) n+1 ) 
A ?& X(p)„+i}, 



then 



r(a,p) = r+(a,p)ur(a,p). 



At the completion of the traversal, g^+i = b, g\\ +v = 
to, V'n+i = x ! anc ^ V'n+i' = w - ^ ^ ne e dg e was chosen from 
r + (a,p) then = + and V'n+i" = +• If the edge was 

chosen from r~(a,p) then g^+y = ~ an d ^/n+i" = — • 
It is always the case that Vn : ip 1 ^,, = g p n ,,. 

Note the relationship between G™ and "J in the defini- 
tion of both T~{a,p) and T + {a,p). It is necessary that 
the rwr :hasPredicate ?u> and the rwr : f orResource 
Iz as defined in ^ also exist in G™. It is through the 
rwr: Traverse rule that the relationship between W and 
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G n is made explicit and demonstrates how 'J constrains 
the path that p can traverse in G n . 

dep icts an example of a traver- 
T^K-P) = {(j,u,a)} and 
= {(a>w,e), (a,w,/)}, where = 
and any one triple is 



Figure 
sal. In Figure 

{{hu,a),{a,u,e),{a,u,f)}, 
selected with | probability. 



T-(a,p) T + (a,p) 
J U) : : LO : 




FIG. 10: An example of the set of edges allowed for traversal 
by p when g% = a. 



E. The rwr : IncrCount and rwr : SubmitCounts Rules 

The purpose of the rwr : IncrCount and 
rwr : SubmitCounts rules is to increment the local 
vertex rank vector 7r p and global vertex rank vector 
7r, respectively. While ir p is a local variable of p, only 
7r is returned at the completion of the grammar-based 
random walker algorithm. The reason for tt p is to ensure 
that prior to incrementing ir, the vertices indexed by 7r p 
are in a grammatically correct region of G n as deter- 
mined by the grammar For example, if p is to index 
a particular lanl: Human, it will do so in n p . However, 
before that lanl : Human is considered legal according 
to fy, p may have to check to see if the lanl: Human is 
lanl : locatedAt the same lanl : University of some 
previously encountered lanl : Human. Thus, when p has 
submitted its ir p to ir, it will have guaranteed that all 
the appropriate aspects of its incremented vertices in 7r p 
have been validated by 'J. This concept will be made 
more salient in the example to follow in the next section. 

Formally, if (<fi, rdf : type, rwr: Context) G 'J/, i[>n = 4>i 
g p — i, and 4> has the rwr: IncrCount rule, then 



p p 

n i (n+1) ~~ n i (n) 



1. 



Next, if g p = z, tp p — (f>, ((j), rdf : type, rwr : Context) G 
ty. and <b has the rwr : SubmitCounts rule, then 



Ti(n+1) = Ti(„) i-TTi („) 



Vi £ TT f 



and 



'i (n+l) 



: Vi G tt p . 



As stated above, once ir p has been submitted to ir, the 
values of ir p are set to 0. 



F. The rwr :Reresolve Rule 

The rwr : Reresolve rule is a way to "teleport" 
the random walker to some random vertex in V 
and is perhaps the most complicated rule of the 
grammar-based random walker ontology. If there 
exists the rwr: Context <fi, ip? = <j>, <j) has the 
rwr : Reresolve rule u, (u, rwr : probability, Id) G "J, 
and (u, rwr : steps, ?m) £ 'J/, then p will have a (d- 100)% 
chance of re-resolving its path from m steps ago to the 
current step n, where d — 0.15 in most PageRank im- 
plementations. If the random walker re-resolves, then 
the path from g P l _ m to g p is recalculated. In other 
words, a new path in G n is determined with respects 
to the rwr: Contexts Vn-m ^° sucn that no rules 
are executed and only those attributes specified by the 
rwr: obeys property are respected. 

For example, suppose ^ n _ m ^ n = 

(4>{n-m) , <^{n-m)+V > i(n-m)+l" ! • ■ ■ i i ±n" ) 4>n) and 

context has a rwr : Reresolve rule, where ifi p = 4> n . 
If the rwr : Reresolve rule rwr: obeys both the rwr: Is 
and rwr : Not attributes, then the grammar-based ran- 
dom walker p will re-resolve its history in G n . Thus it 
will recalculate g v n _ m to g p . The set of legal re-resolved 
paths from n — m steps ago to n is denoted Q( n -m),n- 
Given that the probability d is met, 



Q '(n— m) ,n { v ^^(n— m) + l' ? m) + l" 3 • • • ) 

?6,?^,±„.,?j) I 
(i/>f v , rwr : f orResource, Ix) £ ^ 

A rdf : type, ?x) £ G n V ?i =?x) 
A (0(p) ( „_ m) = V li £ 0{p) {n _ m) ) 
Mi <£ X(p) (n _ m) 

A ((?w„<,rdfs:subPropertyOf ,ip p n ,) £ G r ' 

v ?w (n _ m)+1 , = i? (n _ m)+v ) 
a ((±(„- m )+i" = + 

A (?i,?o; (n _ m)+r ,?a)GG") 

V (±(n-m) + l" = — 

A (?o,?w {n _ m)+ i/,?t) G G n )) 

A ... 

A ((± n «=+ A (?&,?av,?j)GG™) 

V (± n « = - A (?i,?w„/,?6) G G")) 

A ((?w n /,rdfs:subPropertyOf,^,) £ G r< 

V?uv =<,) 
A (i/;^, rwr : f orResource, ?y) G "J 
A ( rdf: type, ?y) G G" V ?j =?y) 
A (0(p)„ = V ?j G 0(p) n ) 
A ?j X(p) n }. 

The probability of p choosing some re-resolved path q £ 



Q(n- 



m) ,n 



^— , where g v k „ = q k „, g p k , = q k ,, and 



Hn-m), 
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9k = Qk for all k such that m < k < n. 

While the above equation is perhaps notationally 
tricky, it has a relatively simple meaning. In short, p 
must recalculate (or re-resolve) its path from m step ago 
to the present step n. This recalculation must follow 
the exact same grammar path denoted in tp p . Thus, if 
from m to n, p had ensured that its current vertex is a 
lanl : Human that is lanl : locatedAt lanl : Laboratory 
then when p "teleports" , the new vertex at n will be guar- 
anteed to also be a lanl: Human that is lanl : locatedAt 
a lanl : Laboratory. 

If there are no rank sinks, this rule guarantees a 
strongly connected network; any vertex can be reached 
by any other vertex in the grammatically correct region 
of G n . However, note that rank sinks are remedied by 
the next rule. 
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FIG. 11: An example scholarly ontology 



example is equivalent to running the single-relational 
implementation of PageRank on a semantic network. 



G. The Empty Rule 



A. Conference Article Co- Authorship Grammar 

*Pcoaut 



Random walker halting occurs when p arrives at some 
rwr: Context where no rule exists or there are no more 
rules to execute (e.g. when a rwr: Traverse rule does 
not provide any transition edges - T(a,p) = 0). At halt 
points, a new random walker with an empty n p and no 
G n or f history (i.e. \g p \ = and |^ p | = 0), enters G n at 
some rwr :EntryContext <j) in "F and some i £ s(V \ (/))■ 
The new random walker executes the grammar. Note 
that the global rank vector ir remains unchanged. 

The combination of the empty rule and the 
rwr :Reresolve rule are necessary to ensure that tt is a 
stationary distribution. Both rules are used in conjunc- 
tion to support grammar-based PageRank calculations. 

In order to demonstrate the aforementioned ideas, the 
next section presents a particular grammar instance de- 
veloped for a scholarly network ontology and instance. 



Let ^coaut denote the grammar for generating a 
7r for the subset of G n that is semantically equiv- 
alent to the coauthorship network resulting from 
lanl : Conf erenceArticles for all lanl : Researchers 
from a lanl : University. ^coaut is diagrammed in 



Figure 12 where, for the sake of convenience, the con- 
text names, without the denote the rdfs : Resource 
pointed to by the rwr : f orResource property of the 
respective rwr: Context. The bolded + or — on the 
edges denotes whether the rwr : Edge is an rwr : DutEdge 
or rwr:InEdge, respectively. The dashed square rep- 
resents an rwr : EntryContext. The stack of rules for 
each rwr : Context denotes the rdf : Seq of rules ordered 
from top to bottom and rwr : Context attributes are also 
stacked (in no particular order) with their respective 
rwr : Context. 



VI. A SCHOLARLY NETWORK EXAMPLE 

This section will demonstrate the application of 
grammar-based random walkers to a scholarly seman- 
tic network denoted G". Figure [TT] diagrams the 
ontology of G n where the tail of the edge is the 
rdfs : domain and the head of the edge is the rdfs : range. 
The dashed lines represent the rdf s : subClassOf re- 
lationship. This ontology represents the relation- 
ships between lanl : Institutions, lanl : Researchers, 
lanl : Articles, and their respective children classes. 

The first example calculates the stationary dis- 
tribution of the subset of G n that is semanti- 
cally equivalent to the coauthorship network re- 
sulting from lanl : Conf erenceArticles written 
by lanl : Researchers that are lanl : locatedAt a 
lanl: University only. The second example presents a 
grammar for calculating the stationary distribution over 
all vertices in a semantic network irrespective of the 
edge labels (i.e. an unconstrained grammar). The second 
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FIG. 12: A grammar to calculate eigenvector centrality on 
a conference article coauthorship network of university re- 
searchers. 

A single grammar-based random walker p G P 
will begin its journey in G n at some vertex i € 
s(V | lanl :University_0), where 

s(V | lanl :University_0) = 

{li | (??, rdf : type, lanl: University) S G n } 
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and the i G s(V | lanl :University_0) is chosen 
with probability | a(v | lanl; ^ lverslty . 0) | ■ After a vertex 
in s(V | lanl :University_0) is chosen, g% = i and 
tpQ = lanl:University_0. There are 2 sequentially or- 
dered rules at University_0: rwr : SubmitCounts_0 and 
rwr :Traverse_0. The first rule has no effect on tt or ir p 



because for all 



(0) 



= 0. The rwr : SubmitCountsJD 



rule is important on the next time around ^ C oaut- With 
the rwr :Traverse_0 rule, p randomly chooses a single 
vertex w in 

W ={?w | (?w, lanl : locatedAt, i) G G n 

A (?w, rdf : type, lanl : Researcher) £(7}, 

where rwr:Is_l requires that g\ = <f_ x and g p _ 1 = 
(i.e. 0(p)i = 0). The rwr:Is_l attribute is important 
the second time around ^coaut- 

At time step 1, g\ = w and ip p = lanl :Researcher_l. 
Researcher_l has the rwr : IncrCount_l rule and thus, 



'io(l) 



1. After the rwr : IncrCount_l rule is executed, 



p will execute the rwr : Traverse_l rule. The random 
walker p will randomly choose some x in 

X ={lx | (w, lanl: wrote, ?x) G G n 

A (?x, rdf : type, lanl : Conf erenceArticle) G G n }. 

If x is properly resolved, then g p — x and V2 = 
lanl : Conf erenceArticle_2. However, if w has not 
written a lanl : Conf erenceArticle, then x = 0. 
At which point, the rwr : Traverse_l rule fails and 
(i, lanl : locatedAt, — , w) is an ungrammatical path in 
G n according to ^coaut- If x = 0, a new random walker 
(i.e. a p with no history and zero tt v ) randomly chooses 
some entry point into "Jcoaut an d G n and the process 
begins again. If, on the other hand, w has written some 
lanl : Conf erenceArticle x, then p will randomly select 
a y in 

Y ={?y I {ly, lanl : wrote, x) G G n 

A rdf: type, lanl: Researcher) G G n 

A?!/ / w}. 

Note the role of the rwr: Not _3 property in 
Researcher _3. rwr:Not_3 guarantees that the 
x lanl : Conf ereneArticle was written by two 
or more lanl : Researchers and that only those 
lanl: Researchers that are not w are selected since 
X(p) 3 = {w}. Semantically, this ensures that the subset 
of G n that is traversed is a coauthorship network. If 
y — 0, then (i, lanl : locatedAt, — , w, lanl : wrote, +, x) 
is an ungrammatical path with respects to ^coaut- If 
y 7^ 0, then g P = y, ip P = Researcher_3, and t^ 3 x = 1- 
Finally, because of the rwr :Traverse_3 rule, p randomly 
selects some z in 



Z ={?z I (y, lanl: locatedAt, ?z) G G n 

A rdf : type, lanl: University) G G n }. 



Thus, g\ = z and ip p = University_0. At this point 
in time, g p — (i, lanl : locatedAt, —,w, lanl: wrote, 
+,x, lanl: wrote, — ,y, lanl : locatedAt, +,z) and g p 
is a ^coaut-correct and w and y are indexed by ir. The 



rwr :SubmitCounts_0 rule ensures that tt, 



w(4) 



l w(4) 



Finally, when rwr : SubmitCounts_0 

n vtA\ = 0- This process contin- 



= ir'' 



and Tr y{4) = ttP (4) 
has completed, 7r£, (4) - , ,, 

ues until the ratio between the counts in tt converge. 

At n — 5, the rwr:Is_l rule is important to en- 
sure that, after checking if the y rwr : Researcher is 
rwr : locatedAt a rwr : University, p return to y be- 
fore locating a rwr : Conf erenceArticle written by y and 
continuing its traversal through the implicit coauthorship 
network in G n as defined by ^coaut- 

What is provided by tt is the number of times a par- 
ticular vertex in V has been visited over a given number 
of time steps n. If vertex i G V was visited tti times then 
the probability of observing a random walker at i is — . 
However, given that J2iev ni — n b ecause other vertices 
not indexed by tt exist on a ^coaut-correct path of G", 
the probability of the random walker being at vertex i 
when observing only those vertices indexed by tt is 



i G V. 



Thus, 



This step is called the normalization of tt and is nec- 
essary for transforming the number of times a vertex in 
V is visited into the probability that the vertex is being 
visited at any one time step. When ||tT(„) — ^(m)!^ — e ' 
where m < n and m and n are consecutive tt update steps 
(i.e. consecutive rwr : SubmitCounts), tt has converged to 
a range acceptable by the e G R provided argument. 

However, tt may never converge if the p-traversed sub- 
set of G n is not strongly connected. For instance, let the 
triple list A n be defined as 

A n ={(?i, lanl : coauthor, ly) \ 

A (?w, rdf : type, lanl : University) G G n 
A lanl: locatedAt, ?w) G G n 
A lanl: wrote, ?x) G G n 

A rdf : type, lanl : Conf erenceArticle) G G n 

A (?y, lanl: wrote, 7x) G G n 

A {ly, lanl: locatedAt, Iz) G G n 

A (?z, rdf : type, lanl: University) G G n 

A ?i ^?y}. 

Furthermore, let V* denote the set of unique 
lanl: Researcher vertices in A n and A G IRl v *l x l y *l be 
a weighted adjacency matrix where 



A ■ 




if (i, lanl : coauthor, y) G A r ' 

if |r+(i)| = o. 
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If Air' = A71"' where A is the largest eigenvalue of 
the eigenvectors of A, then it' is the stationary distri- 
bution of A and thus, the p-traversed subset of G n given 
^coaut is strongly connected. However, most coauthor- 
ship networks are not strongly connected |271 and there- 
fore, 7r' may not be a stationary distribution. For exam- 
ple, there may exists some lanl : University denoted R 
and lanl : locatedAt R are only two lanl : Researchers, 
x and y, that have a coauthor relationship with respects 
to a particular lanl : Conf erenceArticle. If the ran- 
dom walker p happens to enter G n at x, then the ran- 
dom walker will never leave the x/y component. How- 
ever, some new lanl : Researcher, and therefore some 
new lanl : University, can be introduced into the prob- 
lem by re-resolving the lanl : Conf erenceArticle unit- 
ing x and y such that p teleports to some new researcher 
w at some other lanl : University S. This example is 
depicted in Figure |13| where the dashed line represents 
a teleportation by p. This teleportation introduces the 
artificial relationship that x coauthored with w. Thus, 
when there exists a non-zero probability of teleportation 
at every vertex in V* , the coauthorship network becomes 
strongly connected. 



1} 



w 



FIG. 13: Teleportation required for connecting isolated com- 
ponents. 



In order to guarantee a strongly connected network, it 
is possible to simulate the behavior of randomly choos- 
ing some new entry point with probability <5 G (0, 1] as 
an analogy to the method of inducing strong connectivity 
in |33j . The rwr :Reresolve rule is introduced to ^coaut 
at Conf erenceArticle_2 where rwr :Reresolve_2 has a 
5 = 0.15, a rwr : steps of m = 2, and does not rwr : obey 
any rwr: Context attributes, ^coaut' is diagrammed in 
Figure [14] where the "0.15" literal is the object of the 
triple (rwr :Reresolve_2, rwr: probability, "0. 15") S 
^coaut' and the "2" literal is the object of triple 
(rwr :Reresolve_2, rwr : steps, "2") G ^coaut'- 



With respects to G n , every time random walker 
p encounters the rwr : Conf erenceArticle_2 context, 
it has a 15% chance of teleporting to some new 



rwnSubmit 
Counts_0 



rwr:Traverse_0 





| "0.15" 


rwr: 
Reresolve_2 


"2" | 






rwnlncr 
CounM 




rwr:Traverse_2 




rwr:Traverse_1 


f lanl:Conference 

an : wrote — >H ... , „ 

[ Article_2 J 



. lanl:University_0 . 



lanl: locatedAt ■ 

1 + 



lanl:Researcher_1 



lanl:Researcher_3 



- lanklocatedAI ■ 



rwnlncr 
Count_3 



rwr:Traverse_3 



FIG. 14: A grammar to calculate PageRank on a conference 
article coauthorship network of university researchers. 



lanl : Conf erenceArticle i in V such that 

Qn-2,n ={(?W,lx, -,ly,lz,+,U) \ 

A (?w,rdf : type, lanl: University) G G n 

A Ix = lanl : locatedAt 

A (?y, rdf : type, Researcher) G G n 

A Iz — lanl: wrote 

A (?i, rdf : type, 

lanl: Conf erenceArticle) G G n 
A (?y,?x,?w) G G n 
A (?»,?*,?<) GG"}. 

and a new path q G Q n -2.n is chosen with probability 

| Q „_ 2|tl | ■ K 1= (w,x,-,y,z,+,i), 



P = J S(n-2)- 



with probability 1 - 
with probability d. 



The rwr :Reresolve rule guarantees that any confer- 
ence publishing researcher is reachable by any other con- 
ference publishing university researcher and thus, the 
coauthorship network of conference publications by uni- 
versity researchers is strongly connected. Theoretically, 
the rwr :Reresolve_2 rule ensures that there exists some 
hypothetical triple list B n , such that 



B n = {(?i,lanl:teleport,?j) 



3 e v*}, 



where V* is the set of lanl : Researchers from A n . Let 
B G M) v > x > v I be a weighted adjacency matrix where 
for any entry in B, Bjj = t^*t ■ ^coaut' is equivalent to 
computing it' for C where C = SA + (1 — i5)B and S = 
0.85. Therefore, n' generated from ^coaut' is a stationary 
distribution. 

The eigenvector centrality or PageRank of the network 
could have been calculated by extracting the appropriate 
lanl : Researcher vertices from V and generating the im- 
plicit lanl : Conf erenceArticle coauthorship edge be- 
tween them. This was done with the network A n and its 
"teleporation" network B n , where A and B are the re- 
spective adjacency matrices representations of these net- 
works. In this sense, the single-relational eigenvector cen- 
trality or PageRank algorithm would generate the same 



13 



results. However, the grammar-based random walker al- 
gorithm is different than the "isolation-based" method. 
In the grammar-based method, there is no need to gener- 
ate (i.e. make explicit) the implicit single-relational sub- 
set of G n and thus, create another data structure; the 
same G n can be used for different eigenvector calcula- 
tions without altering it. Thus, multiple different gram- 
mars can be running in parallel on the same data set (on 
the same triple-store) . For more complex grammars that 
involve rwr : Is and rwr : Not constraints over multiple cy- 
cles of a grammar, the query to isolate the sub-network 
becomes increasingly long as recursions cannot be ex- 
pressed in the standard RDF query language SPARQL 
[35]. 

B. Simulating Single-Relational PageRank on a 
Semantic Network 



"J. It is noted that only a subset of is considered 
\l/-correct (i.e. grammatically correct according to If 
p is unable to submit its tt p to the global vertex vector 
tt, then p has taken an ungrammatical semantic path in 
G n . On the other hand, if p contributes its tt' p to tt, p 
has taken a grammatical semantic path (i.e. a ^-correct 
path). Let C G^ denote the subset of G* that is 
grammatically correct according to 

Definition 1 (The ^-Correct Paths of G" / + ) The 

path g^^n in G n is considered grammatically correct with 
respects to ^ if and only if i^m * s an rwr : Entry Context 
or an rwr:Context with an rwr:SubmitCounts rule, 
ijjP is an rwr : Context with an rwr : SubmitCounts rule, 
and there exist some rwr : IncrCount rule at time k, 
such that m < k < n. The set of all grammtically 
correct paths form the semantic network G^ + , where 
c G* C G n . 



The grammar depicted in Figure 15 is denoted "Jg and 
is the grammar that calculates tt on any semantic net- 
work without consideration for edge directionality nor 
edge labels. Thus, this grammar is not constrained to 
the ontology of the semantic network and can be applied 
to any G n instance. Furthermore, the rwr:Reresolve 
rule guarantees that all vertices are reachable from all 
other vertices. Note that this grammar ensures that all 
vertices in V are ^-correct. The presented grammar 
is equivalent to executing PageRank on an undirected 
single-relational representation of a semantic network. 



"0.15" 


rwnReresolve 


"0" 






rwr:lncr 
Count_0 


rwnSubmit 
Counts_0 


rwr:Traverse 


I 


I 



rdfs:Resource 
I 



►J rdfs:Resource_0 ^-i- 



rdfs: Resource 
I 



FIG. 15: A grammar to calculate an undirected single- 
relational network PageRank on a semantic network. 



VII. ANALYSIS 

What has been presented thus far is an ontology for 
instantiating an G n specific grammar, the formalization 
of the rules and attributes that must be respected by 
a grammar-based random walker, and an eigenvector 
ccntrality and PageRank example involving a seman- 
tic scholarly network. This section will briefly discuss 
the various permutations of G n that are traversed by a 
grammar-based random walker. 

As stated previously, only a subset of the complete 
semantic network G n is traversed by any p G P. Let 
Qip q qti denote t ne g ra ph traversed by p according to 



The grammatically correct path g^^ n ensures that 
some vertex in g^^n was validated by the grammar ^ 
and indexed by tt. 

Figurc[l6]demonstrates a subset of G n that is traversed 
by P to generate G^ + , where the bold labeled vertices 
are those indexed by tt. 





FIG. 16: G^ as the ^-correct subset of G n . 

Note that the vertices indexed by 7r are not nec- 
essarily all of the vertices encountered by the ran- 
dom walkers in G^ + . Similar to the coauthor ex- 
ample presented previous, while a p € P traverses 
vertices of type lanl : Article, lanl : University, and 
lanl : Researcher, only lanl :Resercher vertices are in- 
dex by 7r. Thus, those vertices indexed by 7r form an 



1G 



implied" network. The G^ + represented in Figure 
has the implied network G* as diagrammed in Figure 
17 The probabilities on the edges are given by branches 
between the respective vertices in G^ + . 

Theorem 1 // G^ + is strongly connected and aperiodic, 
then tt' is a stationary probability distribution. 

Proof. If G^ + is strongly connected, then every vertex 
in G^' + is reachable from any other vertex. Given that tt 
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FIG. 17: G" as the implied network of G* H 



indexes a subset of the vertices in G^ + and the vertices 
in G 71 " are reachable by means of the edges in G^ + , then 
the vertices indexed by tt are strongly connected. Thus, 
the normalization of tt, tt' , is a stationary probability 
distribution. □ 

Note that the above does not generalize to G™. If G n 
is strongly connected, that does not guarantee that the 
grammar will permit the grammar-based random walker 
p to traverse a subset of G n that is strongly connected. 
For example, imagine the network G n depicted in Figure 
[T8| Even if G n is a strongly connected network, the 'In- 
correct subgraph of G n traversed may not be. 



Qi>+ 




FIG. 18: A strongly connected G n does not guarantee a 
strongly connected . 

Finally, if the path distance between the vertices in 
G* is equal in G^ + , then tt' is the primary eigenvector 
of the G m . However, this is not always the case. Figure 
1 1 9 1 demonstrates that the timing between indexing the 
different vertices in the network diagrammed in Figure 
16 is different for different paths chosen by p. 



n j — 

1 I 2~l 3~T~ 



FIG. 19: The variability of index delay times for G" 



the vertices in tt. Given this network manipulation, a 
single-relational eigenvector centrality algorithm on the 
single- relational network G n would yield tt' . Thus, tt' is 
the primary eigenvector of the network G* . □ 

VIII. CONCLUSION 



There is much disagreement to the high-level meaning 
of the primary eigenvector of a network, tt has been as- 
sociated with concepts such as "prestige"', "value", "im- 
portance", etc. For Markov chain analysis, when vertices 
represent states of a system, the meaning is clear; tt de- 
fines the probability that at some random time n, the 
system G 1 will be at some particular state in V , where 
more "central" states (i.e. those with a higher tt proba- 
bility) are more likely to been seen. 

However, the application of tt to more abstract con- 
cepts of centrality such as "value" has been applied in 
the area of the web citation network. If the web is repre- 
sented as a Markov chain, then TTi defines the probability 
that some random web surfer will be at a particular web 
page i at some random time n. Does this phenomena 
denote that web pages with a higher tt probability are 
more "valuable" than those with lower tt probabilities? 
For the many of us who use Google daily, it does [T^] . 
However, for other artifact networks, tt can have a com- 
pletely different meaning. 

In journal usage networks, tt tends to be a component 
which makes a distinction between applied and theoreti- 
cal journals, not "value" or "prestige" [8:. On the other 
hand, the tt calculated for a journal citation network does 
provide us with the notion of "prestige" [7J. This demon- 
strates that tt has a different meaning depending on the 
semantics of the edges traversed. In other words, differ- 
ent grammars provide different interpretations of tt. 

Whether tt represents "value" or some other dimen- 
sion of distinction, this article has provided a method 
for calculating various tt vectors in subsets of the se- 
mantic network G" by means of a random walker al- 
gorithm constrained to a grammar. For researchers with 
nework-based data sets containing heterogeneous entity 
types and heterogeneous relationship types, this article 
may provide a more intuitive way of studying the various 
7TS of G". 



Theorem 2 // the paths in G^ + between the vertices in- 
dexed by tt are of equal length, then tt' is the primary 
eigenvector of G 7 ' . 

Proof. If the path lengths in G^' + between the vertices 
index by tt are of equal length, then the intervening 
non-7r vertices in G^ + can be removed without interfer- 
ing with the relative timing of respective increments to 
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